feat: add read_tabix and read_bam_references convenience functions by pkerpedjiev · Pull Request #171 · abdenlab/oxbow

pkerpedjiev · 2026-03-15T22:43:23Z

read_tabix: queries a BGZF tabix-indexed file for records in a genomic region, returning Arrow IPC bytes with chrom/start/end/raw columns. Accepts file paths and file-like objects; index can be a .tbi/.csi file path or file-like.
read_bam_references: reads reference sequence names and lengths from a BAM file header, returning Arrow IPC bytes with name/length columns. Useful for building chromsizes without scanning the full file.

These functions maintain backward compatibility with clients that relied on the same API in earlier oxbow versions (e.g. HiGlass/clodius).

- read_tabix: queries a BGZF tabix-indexed file for records in a genomic region, returning Arrow IPC bytes with chrom/start/end/raw columns. Accepts file paths and file-like objects; index can be a .tbi/.csi file path or file-like. - read_bam_references: reads reference sequence names and lengths from a BAM file header, returning Arrow IPC bytes with name/length columns. Useful for building chromsizes without scanning the full file. These functions maintain backward compatibility with clients that relied on the same API in earlier oxbow versions (e.g. HiGlass/clodius). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

nvictus · 2026-03-16T14:17:19Z

Thanks, @pkerpedjiev !

For the bam references, doesn't this do the job already? The same property exists for all data sources.

chromsizes = ox.from_bam(...).chrom_sizes

For tabix/CSI-indexed files, the BED datasource already does what you need (from_bed(..., bed_schema="bed3+", index="...")), as long as chrom, start, end are the first 3 columns as required by the BED standard. With the last PR you can also apply custom type parsing to extended BED columns to do parsing on the Rust side (bed_schema=("bed3", {"foo": "int", "bar": "string"}).

DataSources are the preferred API for oxbow, returning iterators that expose record batches from Rust to Python with zero copy.

If I were to add something, I do think that we need a more generic BED-like TSV reader for the tabix use case where chrom, start, end are not the first 3 fields, as tabix allows this.

pkerpedjiev · 2026-03-17T04:26:28Z

Yeah, I think I can use both of those. I'll try to update the clodius PR with those changes and reopen and modify this if it doesn't work.

pkerpedjiev force-pushed the feat/tabix-bam-references branch from cf35486 to db333a7 Compare March 15, 2026 23:29

pkerpedjiev and others added 4 commits March 15, 2026 17:03

fix: apply rustfmt formatting to alignment import in py-oxbow lib.rs

e909f6f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add tests for read_bam_references and read_tabix

e2a4270

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: remove unused import and apply ruff formatting to test_scanners.py

ad7efce

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: use pa.ipc.open_file to decode Arrow IPC file format in tests

05e9550

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pkerpedjiev closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add read_tabix and read_bam_references convenience functions#171

feat: add read_tabix and read_bam_references convenience functions#171
pkerpedjiev wants to merge 5 commits intoabdenlab:mainfrom
pkerpedjiev:feat/tabix-bam-references

pkerpedjiev commented Mar 15, 2026

Uh oh!

nvictus commented Mar 16, 2026 •

edited

Loading

Uh oh!

pkerpedjiev commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pkerpedjiev commented Mar 15, 2026

Uh oh!

nvictus commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkerpedjiev commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nvictus commented Mar 16, 2026 •

edited

Loading